4 research outputs found
FastPacket: Towards Pre-trained Packets Embedding based on FastText for next-generation NIDS
New Attacks are increasingly used by attackers everyday but many of them are
not detected by Intrusion Detection Systems as most IDS ignore raw packet
information and only care about some basic statistical information extracted
from PCAP files. Using networking programs to extract fixed statistical
features from packets is good, but may not enough to detect nowadays
challenges. We think that it is time to utilize big data and deep learning for
automatic dynamic feature extraction from packets. It is time to get inspired
by deep learning pre-trained models in computer vision and natural language
processing, so security deep learning solutions will have its pre-trained
models on big datasets to be used in future researches. In this paper, we
proposed a new approach for embedding packets based on character-level
embeddings, inspired by FastText success on text data. We called this approach
FastPacket. Results are measured on subsets of CIC-IDS-2017 dataset, but we
expect promising results on big data pre-trained models. We suggest building
pre-trained FastPacket on MAWI big dataset and make it available to community,
similar to FastText. To be able to outperform currently used NIDS, to start a
new era of packet-level NIDS that can better detect complex attacks.Comment: arXiv admin note: text overlap with arXiv:2209.1396
ARNLI: ARABIC NATURAL LANGUAGE INFERENCE ENTAILMENT AND CONTRADICTION DETECTION
Natural Language Inference (NLI) is a hot topic research in natural language processing, contradiction detection between sentences is a special case of NLI. This is considered a difficult NLP task which has a big influence when added as a component in many NLP applications, such as Question Answering Systems, text Summarization. Arabic Language is one of the most challenging low-resources languages in detecting contradictions due to its rich lexical, semantics ambiguity. We have created a dataset of more than 12k sentences and named ArNLI, that will be publicly available. Moreover, we have applied a new model inspired by Stanford contradiction detection proposed solutions on English language. We proposed an approach to detect contradictions between pairs of sentences in Arabic language using contradiction vector combined with language model vector as an input to machine learning model. We analyzed results of different traditional machine learning classifiers and compared their results on our created dataset (ArNLI) and on an automatic translation of both PHEME, SICK English datasets. Best results achieved using Random Forest classifier with an accuracy of 99%, 60%, 75% on PHEME, SICK and ArNLI respectively
Mispronunciation Detection of Basic Quranic Recitation Rules using Deep Learning
In Islam, readers must apply a set of pronunciation rules called Tajweed
rules to recite the Quran in the same way that the angel Jibrael taught the
Prophet, Muhammad. The traditional process of learning the correct application
of these rules requires a human who must have a license and great experience to
detect mispronunciation. Due to the increasing number of Muslims around the
world, the number of Tajweed teachers is not enough nowadays for daily
recitation practice for every Muslim. Therefore, lots of work has been done for
automatic Tajweed rules' mispronunciation detection to help readers recite
Quran correctly in an easier way and shorter time than traditional learning
ways. All previous works have three common problems. First, most of them
focused on machine learning algorithms only. Second, they used private datasets
with no benchmark to compare with. Third, they did not take into consideration
the sequence of input data optimally, although the speech signal is time
series. To overcome these problems, we proposed a solution that consists of
Mel-Frequency Cepstral Coefficient (MFCC) features with Long Short-Term Memory
(LSTM) neural networks which use the time series, to detect mispronunciation in
Tajweed rules. In addition, our experiments were performed on a public dataset,
the QDAT dataset, which contains more than 1500 voices of the correct and
incorrect recitation of three Tajweed rules (Separate stretching , Tight Noon ,
and Hide ). To the best of our knowledge, the QDAT dataset has not been used by
any research paper yet. We compared the performance of the proposed LSTM model
with traditional machine learning algorithms used in SoTA. The LSTM model with
time series showed clear superiority over traditional machine learning. The
accuracy achieved by LSTM on the QDAT dataset was 96%, 95%, and 96% for the
three rules (Separate stretching, Tight Noon, and Hide), respectively
Vulnerability Detection Using Two-Stage Deep Learning Models
Application security is an essential part of developing modern software, as
lots of attacks depend on vulnerabilities in software. The number of attacks is
increasing globally due to technological advancements. Companies must include
security in every stage of developing, testing, and deploying their software in
order to prevent data breaches. There are several methods to detect software
vulnerability Non-AI-based such as Static Application Security Testing (SAST)
and Dynamic Application Security Testing (DAST). However, these approaches have
substantial false-positive and false-negative rates. On the other side,
researchers have been interested in developing an AI-based vulnerability
detection system employing deep learning models like BERT, BLSTM, etc. In this
paper, we proposed a two-stage solution, two deep learning models were proposed
for vulnerability detection in C/C++ source codes, the first stage is CNN which
detects if the source code contains any vulnerability (binary classification
model) and the second stage is CNN-LTSM that classifies this vulnerability into
a class of 50 different types of vulnerabilities (multiclass classification
model). Experiments were done on SySeVR dataset. Results show an accuracy of
99% for the first and 98% for the second stage